Many of the figures and analysis methodology were based off of this tutorial from NYU. https://learn.gencore.bio.nyu.edu/rna-seq-analysis/gene-set-enrichment-analysis/

Inputs Received

Information received from the csv samplesheet:

Validating input parameters…

## ✓ Sample 1 ('chen_phillips') parameters validated

All parameters validated successfully!

1. Chen & Phillips Analysis

Metadata information received from the tsv table:

##   replicate strain
## 1      rep1  hrde1
## 2      rep2  hrde1
## 3      rep3  hrde1
## 4      rep1  hrde2
## 5      rep2  hrde2
## 6      rep3  hrde2
## 7      rep1     wt
## 8      rep2     wt
## 9      rep3     wt

Analysis Methods

Here, count data from the RNA-seq experiment is read in the form of a counts matrix. Each column holds data from one sample, and each row represents a gene, such that the i-th row and n-th column tells how reads of gene i were measured in sample n. The values received should be un-normalized counts of sequencing reads or fragments.

# # Read Count Matrix
counts_file <- trimws(sample_params$salmon_merged_gene_counts_file_path)
count.matrix <- read_tsv(counts_file)
count.matrix <- count.matrix %>%
  mutate(across(3:ncol(.), ~ as.integer(.x))) %>%
  column_to_rownames("gene_id")

gene.reference <- dplyr::select(count.matrix, 1)

count.matrix <- count.matrix %>%
  dplyr::select(-1)

Information entered as metadata describes the samples (columns) of the count matrix. This metadata is combined with the sample/column names from the count matrix so the accuracy of the metadata can be reviewed. Take time to review the table below.

# Design ColData Matrix
sample <- colnames(count.matrix)
colData <- data.frame(sample)

# Add Columns for Condition to colData
for (i in colnames(metadata)){
  colData[i] <- metadata[i]
}

colData <- column_to_rownames(colData, "sample")
##            replicate strain 
## hrde1_rep1 "rep1"    "hrde1"
## hrde1_rep2 "rep2"    "hrde1"
## hrde1_rep3 "rep3"    "hrde1"
## hrde2_rep1 "rep1"    "hrde2"
## hrde2_rep2 "rep2"    "hrde2"
## hrde2_rep3 "rep3"    "hrde2"
## wt_rep1    "rep1"    "wt"   
## wt_rep2    "rep2"    "wt"   
## wt_rep3    "rep3"    "wt"

Using the count matrix, column metadata, and user-inputed design formula (expressing the variables to be used in modeling), a DESeq data set is created.

The experimental variable and the associated control/experimental conditions are given by the user in “ref_level_cond” and “ref_level_value” in the sampleInfo file respectively.

Pre-filtering is performed, keeping only genes that have a count of at least 10 in a minimum number of samples. The minimum number of samples is decided by calculating the number of times the reference level value (control condition) is listed in the metadata with the assumption that experimental conditions will be repeated the same number of times.

Standard differential expression analysis is performed with the DESeq function, and regularized log transformation (rlog) transforms count data to a log2 scale for PCA.

# Set Reference Level
reflevelcond <- trimws(sample_params$ref_level_cond)
reflevelvalue <- trimws(sample_params$ref_level_value)
# Run DESeq2 analysis
deseq_result <- runDESeq2Analysis(
  count.matrix = count.matrix,
  colData = colData,
  design = sample_params$design,
  ref_level_cond = reflevelcond,
  ref_level_value = reflevelvalue,
  metadata = metadata
)

dds <- deseq_result$dds
rld <- deseq_result$rld

If input data is designated to be batch corrected, the ComBat_seq() function from the SVA Bioconductor package is used. The batch (condition) to be regressed out is given in sampleInfo.csv as “batch_cond”, and the sample information for this condition is taken from the metadata.

The group argument of ComBat_seq() specifies biological covariates, whos signals should be preserved in adjusted data. All conditions (columns) from the metadata which are not to be regressed out are passed to the group argument. Differential expression analysis is otherwise performed as described above on batch-corrected counts.

if (as.logical(trimws(sample_params$batch_correct))) {
  # Replicate / Batch Correction
  batch <- trimws(sample_params$batch_cond)

  batch.correct <- ComBat_seq(
    counts = as.matrix(count.matrix),
    batch = metadata[[batch]],
    group = metadata[[colnames(metadata)[colnames(metadata) != batch]]]) %>%
    as.data.frame()

  # Run DESeq2 analysis on batch-corrected counts
  batch_deseq_result <- runDESeq2Analysis(
    count.matrix = batch.correct,
    colData = colData,
    design = sample_params$design,
    ref_level_cond = reflevelcond,
    ref_level_value = reflevelvalue,
    metadata = metadata
  )

  batch.correct.dds <- batch_deseq_result$dds
  batch.correct.rld <- batch_deseq_result$rld
  
  # Cleanup
  rm(batch, batch.correct, keep)
}

Reference level condition and value (ref_level_cond and ref_level_value respectively) inputs from sampleInfo.csv are used to create contrast terms to compare treatment samples with control samples. This relies on the design formula being formatted correctly to produce comparisons relevant to the experimental condition.

# # Find relevant contrasts
contrasts <- list()

# Factor values of experimental condition
pattern <- factor(metadata[[reflevelcond]])
comp <- levels(pattern)

# Remove control state from experimental condition vector
comp <- comp[comp != reflevelvalue]

# Create Contrasts
for (i in 1:length(comp)){
  contrasts[[i]] <- c(reflevelcond, comp[i], reflevelvalue)
}

Based on the number of experimental conditions (identified in the ref_level_cond column of the metadata) and the contrast terms comparing them to the experimental control (ref_level_value), differential expression results are extracted from the DESeq data set object.

# Find results from relevant contrasts
results <- list()

# Pull results from DESeq object with contrasts
for (i in 1:length(contrasts)){
  results[[i]] <- list()
  results[[i]][["DataFrame"]] <- results(dds.for.analysis, contrast = contrasts[[i]]) %>%
    data.frame() %>%
    rownames_to_column("gene_id") %>%
    mutate(gene_name = gene.reference[gene_id, 1], .before=baseMean) %>% # Add gene names to DESeq results matrices
    column_to_rownames("gene_id")
  
  results[[i]][["DESeqDataSet"]] <- results(dds.for.analysis, contrast = contrasts[[i]])
}

Principle Component Analysis

The Bioconductor package PCAtools is used to perform principle component analysis on regularized log transformed count data. Scree plots show principle component numbers on the x-axis, and their respective eigenvalues on the y-axis.

The third graph plots experimental metadata against a number of PCs and their gene loadings to visualize the agreement with the gene expression pattern of each condition for each PC.

No Regression

Replicate Regressed Out

strain_hrde1_vs_wt

DEG Report

Ontology Report

Transfering WORMBASE gene IDs to ENTREZID:

308 gene IDs were not transfered from WORMBASE to ENTREZID 
12108 gene IDs were successfully transfered from WORMBASE to ENTREZID 

Number of highly variable genes identified (with | L2FC | > 0.8 and padj < 0.1) is:

495

Number of highly upregulated genes identified (with L2FC > 0.8 and padj < 0.1) is:

377

Number of highly downregulated genes identified (with L2FC < -0.8 and padj < 0.1) is:

118

Dysregulated

Gene Ontology GSEA
protein phosphorylation

phosphorylation

dephosphorylation

ribosome biogenesis

tRNA metabolic process

collagen and cuticulin-based cuticle development

rRNA processing

male gamete generation

mitochondrial translation

RNA modification

cellular respiration

innate immune response

immune system process

mitochondrial gene expression

mitochondrial respirasome assembly

establishment of protein localization to membrane

immune response

developmental process involved in reproduction

cell fate commitment

neuropeptide signaling pathway

defense response

response to biotic stimulus

biological process involved in interspecies interaction between organisms

Reactome Pathway GSEA
Translation

Mitochondrial translation termination

Mitochondrial translation

Mitochondrial translation elongation

SRP-dependent cotranslational protein targeting to membrane

GTP hydrolysis and joining of the 60S ribosomal subunit

Formation of a pool of free 40S subunits

Nonsense Mediated Decay (NMD) independent of the Exon Junction Complex (EJC)

Metabolism of RNA

Physiological factors

Upregulated

GO Classification
GO Over-representation
Reactome Pathway Over-representation

Downregulated

GO Classification
GO Over-representation

Reactome Pathway Over-representation

Table Reactome Pathway Over-representation Analysis of Downregulated Genes has no rows.

strain_hrde2_vs_wt

DEG Report

Ontology Report

Transfering WORMBASE gene IDs to ENTREZID:

308 gene IDs were not transfered from WORMBASE to ENTREZID 
12108 gene IDs were successfully transfered from WORMBASE to ENTREZID 

Number of highly variable genes identified (with | L2FC | > 0.8 and padj < 0.1) is:

1557

Number of highly upregulated genes identified (with L2FC > 0.8 and padj < 0.1) is:

1051

Number of highly downregulated genes identified (with L2FC < -0.8 and padj < 0.1) is:

506

Dysregulated

Gene Ontology GSEA
dephosphorylation

defense response to bacterium

protein phosphorylation

phosphorylation

male gamete generation

neuropeptide signaling pathway

sexual reproduction

mitochondrial transmembrane transport

Reactome Pathway GSEA

Table Reactome Pathway GSE Of All Differentially Expressed Genes has no rows.

Upregulated

GO Classification
GO Over-representation
Reactome Pathway Over-representation

Downregulated

GO Classification
GO Over-representation

Reactome Pathway Over-representation

Session Information

This section provides complete information about the R environment, package versions, and system configuration used to generate this report. This information is critical for reproducing the analysis.

R Version and Platform:

R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS 15.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] here_1.0.2                  RColorBrewer_1.1-3          lubridate_1.9.4             forcats_1.0.0              
 [5] purrr_1.1.0                 tidyr_1.3.1                 tidyverse_2.0.0             patchwork_1.3.2            
 [9] rrvgo_1.18.0                EnhancedVolcano_1.24.0      org.Ce.eg.db_3.20.0         AnnotationDbi_1.68.0       
[13] vsn_3.74.0                  sva_3.54.0                  BiocParallel_1.40.2         genefilter_1.88.0          
[17] mgcv_1.9-3                  nlme_3.1-168                GOSemSim_2.32.0             DESeq2_1.46.0              
[21] SummarizedExperiment_1.36.0 Biobase_2.66.0              MatrixGenerics_1.18.1       matrixStats_1.5.0          
[25] GenomicRanges_1.58.0        GenomeInfoDb_1.42.3         IRanges_2.40.1              S4Vectors_0.44.0           
[29] BiocGenerics_0.52.0         clusterProfiler_4.14.6      ReactomePA_1.50.0           readr_2.1.5                
[33] tibble_3.3.0                stringr_1.5.2               knitrBootstrap_1.0.3        PCAtools_2.18.0            
[37] ggrepel_0.9.6               ggplot2_4.0.0               htmltools_0.5.8.1           ashr_2.2-63                
[41] kableExtra_1.4.0            knitr_1.50                  DT_0.34.0                   dplyr_1.1.4                
[45] bookdown_0.44              

loaded via a namespace (and not attached):
  [1] fs_1.6.6                  enrichplot_1.26.6         httr_1.4.7                tools_4.4.0              
  [5] R6_2.6.1                  lazyeval_0.2.2            withr_3.0.2               graphite_1.52.0          
  [9] gridExtra_2.3             preprocessCore_1.68.0     cli_3.6.5                 textshaping_1.0.3        
 [13] labeling_0.4.3            slam_0.1-55               sass_0.4.10               SQUAREM_2021.1           
 [17] S7_0.2.0                  tm_0.7-16                 askpass_1.2.1             mixsqp_0.3-54            
 [21] systemfonts_1.2.3         yulab.utils_0.2.1         gson_0.1.0                DOSE_4.0.1               
 [25] svglite_2.2.1             R.utils_2.13.0            invgamma_1.2              limma_3.62.2             
 [29] rstudioapi_0.17.1         RSQLite_2.4.3             treemap_2.4-4             generics_0.1.4           
 [33] gridGraphics_0.5-1        crosstalk_1.2.2           vroom_1.6.5               GO.db_3.20.0             
 [37] Matrix_1.7-4              abind_1.4-8               R.methodsS3_1.8.2         lifecycle_1.0.4          
 [41] yaml_2.3.10               edgeR_4.4.2               qvalue_2.38.0             SparseArray_1.6.2        
 [45] grid_4.4.0                blob_1.2.4                promises_1.3.3            dqrng_0.4.1              
 [49] crayon_1.5.3              ggtangle_0.0.7            lattice_0.22-7            beachmat_2.22.0          
 [53] cowplot_1.2.0             annotate_1.84.0           KEGGREST_1.46.0           pillar_1.11.0            
 [57] fgsea_1.32.4              codetools_0.2-20          fastmatch_1.1-6           glue_1.8.0               
 [61] ggfun_0.2.0               data.table_1.17.8         vctrs_0.6.5               png_0.1-8                
 [65] treeio_1.30.0             gtable_0.3.6              cachem_1.1.0              xfun_0.53                
 [69] S4Arrays_1.6.0            mime_0.13                 tidygraph_1.3.1           survival_3.8-3           
 [73] pheatmap_1.0.13           statmod_1.5.1             ggtree_3.14.0             bit64_4.6.0-1            
 [77] rprojroot_2.1.1           bslib_0.9.0               affyio_1.76.0             irlba_2.3.5.1            
 [81] colorspace_2.1-2          DBI_1.2.3                 tidyselect_1.2.1          bit_4.6.0                
 [85] compiler_4.4.0            graph_1.84.1              xml2_1.4.0                NLP_0.3-2                
 [89] DelayedArray_0.32.0       scales_1.4.0              affy_1.84.0               rappdirs_0.3.3           
 [93] digest_0.6.37             rmarkdown_2.29            XVector_0.46.0            pkgconfig_2.0.3          
 [97] umap_0.2.10.0             sparseMatrixStats_1.18.0  fastmap_1.2.0             rlang_1.1.6              
[101] htmlwidgets_1.6.4         UCSC.utils_1.2.0          shiny_1.11.1              DelayedMatrixStats_1.28.1
[105] farver_2.1.2              jquerylib_0.1.4           jsonlite_2.0.0            R.oo_1.27.1              
[109] BiocSingular_1.22.0       magrittr_2.0.4            GenomeInfoDbData_1.2.13   ggplotify_0.1.2          
[113] wordcloud_2.6             Rcpp_1.1.0                ape_5.8-1                 viridis_0.6.5            
[117] reticulate_1.43.0         stringi_1.8.7             ggraph_2.2.2              zlibbioc_1.52.0          
[121] MASS_7.3-65               plyr_1.8.9                parallel_4.4.0            Biostrings_2.74.1        
[125] graphlayouts_1.2.2        splines_4.4.0             hms_1.1.3                 locfit_1.5-9.12          
[129] igraph_2.1.4              markdown_2.0              reshape2_1.4.4            ScaledMatrix_1.14.0      
[133] XML_3.99-0.19             evaluate_1.0.5            BiocManager_1.30.26       tzdb_0.5.0               
[137] tweenr_2.0.3              httpuv_1.6.16             openssl_2.3.3             polyclip_1.10-7          
[141] gridBase_0.4-7            ggforce_0.5.0             rsvd_1.0.5                xtable_1.8-4             
[145] reactome.db_1.89.0        RSpectra_0.16-2           tidytree_0.4.6            later_1.4.4              
[149] ragg_1.5.0                viridisLite_0.4.2         truncnorm_1.0-9           aplot_0.2.9              
[153] memoise_2.0.1             timechange_0.3.0         

Package Versions:

                Package           Version 
bookdown        "bookdown"        "0.44"  
clusterProfiler "clusterProfiler" "4.14.6"
DESeq2          "DESeq2"          "1.46.0"
dplyr           "dplyr"           "1.1.4" 
EnhancedVolcano "EnhancedVolcano" "1.24.0"
ggplot2         "ggplot2"         "4.0.0" 
org.Ce.eg.db    "org.Ce.eg.db"    "3.20.0"
PCAtools        "PCAtools"        "2.18.0"
ReactomePA      "ReactomePA"      "1.50.0"

Analysis Parameters:

Report generated: 2025-11-26 15:09:25.570865
Working directory: /Users/masonmatich/Documents/research_labs/projects/bulk_RNA_seq_workflow